Intrinsic dimension estimation of data by principal component analysis

نویسندگان

  • Mingyu Fan
  • Nannan Gu
  • Hong Qiao
  • Bo Zhang
چکیده

Estimating intrinsic dimensionality of data is a classic problem in pattern recognition and statistics. Principal Component Analysis (PCA) is a powerful tool in discovering dimensionality of data sets with a linear structure; it, however, becomes ineffective when data have a nonlinear structure. In this paper, we propose a new PCA-based method to estimate intrinsic dimension of data with nonlinear structures. Our method works by first finding a minimal cover of the data set, then performing PCA locally on each subset in the cover and finally giving the estimation result by checking up the data variance on all small neighborhood regions. The proposed method utilizes the whole data set to estimate its intrinsic dimension and is convenient for incremental learning. In addition, our new PCA procedure can filter out noise in data and converge to a stable estimation with the neighborhood region size increasing. Experiments on synthetic and real world data sets show effectiveness of the proposed method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Intrinsic Dimension Estimation by Maximum Likelihood in Probabilistic PCA

A central issue in dimension reduction is choosing a sensible number of dimensions to be retained. This work demonstrates the asymptotic consistency of the maximum likelihood criterion for determining the intrinsic dimension of a dataset in a isotropic version of Probabilistic Principal Component Analysis (PPCA). Numerical experiments on simulated and real datasets show that the maximum likelih...

متن کامل

Estimating the intrinsic dimensionality of hyperspectral images

Estimating the intrinsic dimensionality (ID) of an intrinsically low (d-) dimensional data set embedded in a high (n-) dimensional input space by conventional Principal Component Analysis (PCA) is computationally hard because PCA scales cubic (O(n)) with the input dimension [11]. Besides this computational drawback, global PCA will overestimate the ID if the data manifold is curved. In this pap...

متن کامل

Feature Dimension Reduction of Multisensor Data Fusion using Principal Component Fuzzy Analysis

These days, the most important areas of research in many different applications, with different tools, are focused on how to get awareness. One of the serious applications is the awareness of the behavior and activities of patients. The importance is due to the need of ubiquitous medical care for individuals. That the doctor knows the patient's physical condition, sometimes is very important. O...

متن کامل

Forecasting Financial Time Series through Intrinsic Dimension Estimation and Non-Linear Data Projection

A crucial problem in non-linear time series forecasting is to determine its auto-regressive order, in particular when the prediction method is non-linear. We show in this paper that this problem is related to the fractal dimension of the time series, and suggest using the Curvilinear Component Analysis (CCA) to project the data in a non-linear way on a space of adequately chosen dimension, befo...

متن کامل

Intrinsic Dimensionality Estimation in Visualizing Toxicity Data

Over the years, a number of dimensionality reduction techniques have been proposed and used in chemo informatics to perform nonlinear mappings. Nevertheless, data visualization techniques can be efficiently applied for dimensionality reduction mainly in a case if the data are not really high-dimensional and can be represented as a nonlinear low-dimensional manifold when it is possible to reduce...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1002.2050  شماره 

صفحات  -

تاریخ انتشار 2010